-
Notifications
You must be signed in to change notification settings - Fork 175
[rust] add support for accesing the list of line offsets #3860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Oh, I see -- the doc comment on prism/include/prism/util/pm_newline_list.h Lines 24 to 40 in 6881fd5
The indices in the list are actually the indices for the beginning of each line, not of the actual newlines in the source string. OK, that makes the tests make a little more sense, but in that case, I think the function name should be something more like |
3880eda to
ee562a8
Compare
|
Moving this to draft; I think if we're going to provide line offsets, we might as well go ahead and just expose an iterator over the lines instead, since that's what people are going to be doing anyway. |
|
The list of line offsets is useful for quite a few other things, which probably will be useful in Rust too: prism/lib/prism/parse_result.rb Lines 79 to 169 in 38c64e5
Agreed the name pm_newline_list_t is confusing 😅Would need to fix that without breaking compatibility though. |
|
Going to submit this for review as-is; we can add |
| /// Returns a slice containing the offsets of the start of each line in the source string | ||
| /// that was parsed. | ||
| #[must_use] | ||
| pub fn line_offsets(&self) -> &'pr [usize] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to expose this, but I'd prefer to keep the naming consistent. (Right now, newline_list is newline_list everywhere except in rust where it's line_offsets.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can appreciate consistency, but I think "newline_list" is a bad name regardless, and a function named *_list that actually returns a slice seems very un-Rusty. Sacrifices do have to be made in cross-language projects, though...
Would you take a pre-work PR to change the naming here -- either for the fields, the underlying data structure, or all of the above -- prior to landing this? This seems like a good time to rename things as #3772 would likely require significant changes to clients anyway in the next release, and trying to do all the breaking changes all at once seems like a reasonable thing to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, yeah that's fine. Let's rename it across the board. We won't be able to merge unless we do the corresponding work in CRuby, but I'm okay to do that if you can take the prism side of things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I can do that, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, newline_list is newline_list everywhere except in rust where it's line_offsets.
FWIW I checked and it's called ParseResult#offsets in Ruby:
prism/lib/prism/parse_result.rb
Lines 42 to 43 in 54a5dd3
| # The list of newline byte offsets in the source code. | |
| attr_reader :offsets |
And it's already called lineOffsets in Java:
| private int[] lineOffsets = null; |
As per subject.
rubyfmtspends a non-trivial amount of time figuring out where the newlines are in the source string, and since the parser already has this information, it seems silly to have to recompute it.I'm glad I wrote tests, though, because the implementation seems surprising to me -- I don't think I ever would have expected the list to contain offsets for newlines that aren't actually present in the source string. I guess it's too late to change the contract here? (I considered making this code return
list[1..], but I think the consistency between it and the C implementation is more important; happy to reconsider or to get the underlying library changed.)cc @reese